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REGULARIZED COMPUTATION OF APPROXIMATE 
PSEUDOINVERSE OF LARGE MATRICES USING LOW-RANK 
TENSOR TRAIN DECOMPOSITIONS 

NAMGIL LEEt AND ANDRZEJ CICHOCKlt 

Abstract. We propose a new method for low-rank approximation of Moore-Penrose pseudoin¬ 
verses (MPPs) of large-scale matrices using tensor networks. The computed pseudoinverses can be 
useful for solving or preconditioning of large-scale overdetermined or underdetermined systems of lin¬ 
ear equations. The computation is performed efficiently and stably based on the modified alternating 
least squares (MALS) scheme using low-rank tensor train (TT) decompositions and tensor network 
contractions. The formulated large-scale optimization problem is reduced to sequential smaller-scale 
problems for which any standard and stable algorithms can be applied. Regularization technique is 
incorporated in order to alleviate ill-posedness and obtain robust low-rank approximations. Numer¬ 
ical simulation results illustrate that the regularized pseudoinverses of a wide class of non-square or 
nonsymmetric matrices admit good approximate low-rank TT representations. Moreover, we demon¬ 
strated that the computational cost of the proposed method is only logarithmic in the matrix size 
given that the TT-ranks of a data matrix and its approximate pseudoinverse are bounded. It is 
illustrated that a strongly nonsymmetric convection-diffusion problem can be efficiently solved by 
using the preconditioners computed by the proposed method. 

Key words. Alternating least squares (ALS), density matrix renormalization group (DMRG), 
curse of dimensionality, solving of huge system of linear equations, low-rank tensor approximation, 
matrix product operators, matrix product states, preconditioning, generalized inverse of huge matri¬ 
ces, tensor networks, big data. 
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1. Introduction. In this paper, we consider the approximate numerical solution 
of very large-scale systems of linear equations 

(1.1) Ax = b, A e R lxj , beR 7 . 

In the case that both the number of equations I and the number of unknowns J 
have an exponential rate of increase, e.g., I = P N and J = Q N with TV > 20 and 
P, Q > 2, standard numerical solution methods cannot be applied directly without 
exploiting structure of the matrix such as the sparsity due to high computational and 
storage costs. Such an exponential increase with size of the matrix is referred to as 
the curse of dimensionality. For example, standard numerical methods for solving 
high-dimensional partial differential equations often become intractable as the dimen¬ 
sionality of the involved operators and functions increases. We consider structured 
matrices with billions of rows and columns and beyond, without any assumption that 
the data matrix is sufficiently sparse. 

In order to break the curse of dimensionality, low-rank tensor decomposition tech¬ 
niques are gaining growing attention in numerical computing and signal processing 
[6, 9, 10, 23, 33, 52]. A tensor of order N is an iV-dimensional array. Higher-order 
tensors arise from various sources such as multi-dimensional data analysis [11, 34] 
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and high-dimensional problems in scientific computing [23, 33]. Tensor decomposi¬ 
tion techniques transform such higher-order tensors into low-parametric data repre¬ 
sentation formats. Comprehensive surveys about traditional tensor decomposition 
methods such as CANDECOMP/PARAFAC (CP) and Tucker formats are provided 
in [11, 34]. However, traditional tensor decomposition methods have limitations in 
high-order tensor approximation and numerical computing [23, 33]. On the other 
hand, modern tensor decomposition methods such as the hierarchical Tucker (HT) 
[22, 26] and tensor train (TT) formats [41, 44] are promising tools for breaking down 
the curse of dimensionality. For instance, once large-scale matrices and vectors are 
reshaped into higher-order tensors and represented approximately in TT format (e.g., 
see [30, 32]), basic algebraic operations such as addition and matrix-by-vector mul¬ 
tiplication can be performed with logarithmic storage and computational complexity 

[41]. 

We mostly focus on the TT format, which is equivalent to the matrix product 
states (MPS) with open boundary conditions (OBC) in quantum physics [51]. Algo¬ 
rithms for solving optimization problems using TT formats, due to its simple linear 
structure of separation of variables (or dimensions), are often presented in simple 
recursive forms, please see, e.g., TT-rank truncation algorithm [41]. 

However, previous studies on numerical algorithms for solving systems of linear 
equations based on TT formats focus mostly on the cases of square data matrices, i.e., 
A G R /Xj with I = J; see, e.g., TT-GMRES [15], alternating least squares (ALS) 
[28], modified alternating least squares (MALS) [28], density matrix renormalization 
group (DMRG) [43], and alternating minimal energy (AMEn) [17]. In order to apply 
the existing algorithms to the case of non-square or strongly nonsymmetric coefficient 
matrices, the normal equation A 1 Ax = A T b can be considered, in which the solution 
is equivalent in the sense of minimum of the linear least squares problem. However, 
the matrix product A 1 A is often extremely ill-conditioned since the singular values 
of A are squared. The ill-conditioning can slow down the convergence rate and reduce 
accuracy of developed algorithms. 

Our main objective is to develop a new method to compute an approximation to 
the Moore-Penrose pseudoinverse (MPP) of A, i.e., P T « A' e M . jxl , and solve the 
preconditioned system 

(1.2) P t Ax = P T b, 

when I > J, or AP T y = b with the substitution x = P T y when I < J. The 
preconditioned matrix P T A (resp. AP T ) should be square and near symmetric, and 
have more uniformly distributed eigenvalues possibly far away from zero to improve 
the convergence property of an iterative method. 

We can compute an approximate MPP P T « A^ € M. jxl by minimizing the cost 
function 

(1.3) -Fa(P) = ||lj — P T A||p + A||P|||, A > 0, 

assuming without loss of generality that I > J. The objective function (1.3) with 
A = 0 has been considered widely for the computation of preconditioners by, e.g., 
sparse approximate inverse (SPAI) preconditioners [5, 24] and generalized approx¬ 
imate inverse (GAINV) preconditioners [12]. Most approximate inverse techniques 
are claimed to be largely immune to the risk of breakdown during the preconditioner 
construction [5]. The general case that A > 0 was considered by, e.g., the optimal 
low-rank regularized inverse matrix approximation [7, 8]. The regularization term is 
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helpful for alleviating ill-posedness of the minimization problem (1.3) and improv¬ 
ing the convergence property of the proposed algorithm. The simulation results in 
Section 4 illustrate that the regularization approach is also helpful for avoiding over¬ 
estimation of TT-ranks of the pseudoinverses. 

We propose a new method for computing the approximate generalized inverse P T 
in the form of a low-rank TT decomposition. Many of the sparse approximate inverse 
algorithms such as the SPAI preconditioners [24] assume that approximate inverses are 
sparse, however, inverses of sparse matrices are often not sparse. On the other hand, 
matrices represented in TT format do not have to be sparse, instead, they are required 
to have relatively small TT-ranks. It has been shown that inverses and preconditioners 
of several classes of important matrices such as Laplace-like operators and banded 
Toeplitz matrices admit approximate low-rank TT representations [4, 30, 31, 32, 45]. 

Holtz, Rohwedder, and Schneider [28] and Oseledets and Dolgov [43] developed 
DMRG (also called as MALS) methods for solving systems of linear equations with 
square matrices in TT formats. The application to matrix inversion is presented 
in [43]. However, the matrix inversion method proposed in [43] is applicable only 
to square matrices. And when we want to apply this approach, the linear matrix 
equation P T A = I j is converted to a larger linear least squares (LS) problem. On 
the other hand, the proposed method can be applied to general non-square matrices, 
whose pseudoinverses can be efficiently computed without a need to solve a larger 
linear LS problem. Finally, once an approximate pseudoinverse is computed, the pre¬ 
conditioned linear system can be solved by applying existing TT-based optimization 
algorithms such as the ones developed in [17, 42, 43]. Numerical simulation results 
illustrate that the approximate generalized inverses obtained by the proposed method 
provide usually low TT-ranks with a sufficiently small approximation error. The re¬ 
sulting preconditioned linear systems can be solved by existing TT-based algorithms 
much faster than linear systems without preconditioning. A wide class of structured 
matrices in TT format are considered and demonstrated to have low TT-rank precon¬ 
ditioners. 

The paper is organized as follows. In Section 2, we describe briefly mathematical 
representations for TT decomposition. In Section 3, we design a new tensor network 
and develop a MALS algorithm for computing approximate pseudoinverses. In Section 
4, experimental results are presented to demonstrate the validity and effectiveness of 
the proposed method. In Section 5, discussion and concluding remarks are given. 

2. The TT Decompositions. 

2.1. Notation. We will briefly describe notation for tensors and tensor opera¬ 
tions used in this paper. We refer to [11, 14, 34] for further details. Scalars, vectors, 
and matrices are denoted by lower-case letters (a, b , ...), lower-case bold letters (a, 
b, ...), and upper-case bold letters (A, B, ...), respectively. iVth-order tensors, i.e., 
N -way arrays (for N > 3), are denoted by calligraphic letters (A, B , ...). For a 
tensor X £ ffiNi x Ax■ ■ ■ x^ where I n is the size of the nth mode , the {i\, * 2 , • • •, ijv)th 
entry of X is denoted by Mode-n fibers of a tensor X £ R^x^x-xijv are 

column vectors determined by fixing all the indices except 

for the nth index. The mode-1 contracted product of tensors A £ K^xhx-x/j, and 
B £ K /|txJ 2 x Ax---x J m j s a binary operation defined by the tensor 1 

(2 1) A* B £ JfchxIaX — x/jv-iX J 2 xJ 3 X" XJ M 

1 We call for simplicity such operation mode-1 contraction, because the mode one of a tensor B 
is contracted with the mode TV of a tensor A. 



4 


N. LEE AND A. CICHOCKI 



(c) (d) 


Fig. 1. Tensor network diagrams for (a) vector, matrix, and third-order tensor, (b) mode-1 
contracted product of two third-order tensors, (c) left-orthogonalized and right-orthogonalized fourth- 
order tensors, and (d) trace( AB) for matrices A E M. lxj and B E M. JxI . 


with entries 


In 


{A*B) 




= J2 a 

*jv = 1 




The mode-1 contracted product is a natural generalization of the matrix-by-matrix 
product to higher-order tensors. Note that it has associativity: (A»B)»C = A»(B»C) 
for any tensors A, B, and C of proper sizes. 

Basic symbols for tensor are shown in Figure 1. In particular, symbols for vectors 
(lst-order tensors), matrices (2nd-order tensors), and 3rd-order tensors are illustrated 
in Figure 1(a), while the mode-1 contraction of two 3rd-order tensors is illustrated in 
Figure 1(b). 

Indices ■ ■ ■ An, which run through i n = 1, 2,..., I n , can be grouped in a 

single multi-index ifi 2 ■ • - in- 2 The following two notations are defined based on the 
multi-index. 

Definition 2.1. For a fixed n £ {1,2,... ,1V}, the nth canonical matricization 
of a tensor X £ K / i x '" x/jv is defined by the matrix [27] 

(2.2) X {n} (zWiFh-InXln+l-lN, 

with entries 


(X { „ } )i 


*n + l"-1iV 


= Xi 




Definition 2.2. For a fixed n £ {1,2,..., TV}, the mode-n matricization of a 
tensor X £ M 7iX '"X j jv defined by the matrix [34] 

(2.3) X (n) £ R/nX/l/2-/n- 1 /n +1 -/i V) 

with entries 


0^(n))i ni i 1 ...i n _ 1 i n+1 ...i N x ii,i 2 ,--- ,iiv ■ 


2 The multi-index can be defined by either the big-endian i\i 2 • • - iN = + (ijv—l — 1 )In + 

• • • + {fi — l)/ 2-^3 • • • In or the little-endian i\i 2 • • -ijv = ii + (*2 — l)ii H— • + (ijv — l)/i /2 ■ • • In—i- 
In this paper, we use the little-endian convention unless otherwise mentioned. 
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Note that the columns of the mode-n matricization X( n ) are the mode-n fibers of X. 
Moreover, the vectorization of a tensor X £ — j s defined as the Nth canonical 

matricization 


(2.4) 


vec(X) = *-{n} S R Ill2 '" lN ^ 


whose entries are (vec(A)) n i2 ,,, iN = x ilti2i ... iiN . 

2.2. The HT Format. The hierarchical Tucker (HT) and tensor train (TT) 
formats are low-parametric representations for vectors, matrices, and higher-order 
tensors. The HT format, introduced by Hackbusch and his coworkers [25, 26], can be 
considered as a more general model than the TT format, but most techniques for TT 
format can be extended to HT format and the generalization is often straightfoward, 
e.g., see [36]. 

Let V = <g) • • ■ ® V be a tensor product space, where V^ (1 < n < N ) are 

Hilbert spaces. For example, we can consider that = R /n with the innerproduct 
defined by (v,w) = v T w, or V^ = R /nXl/n with (V, W) = trace(V T W). For 
TV > 1, a binary tree T/v is called a dimension tree if 

(a) all nodes t £ T/v are non-empty subsets of {1,..., TV}, 

(b) the set t root = {1,..., TV} is the root node of T/v, and 

(c) each node t £ T/v with |t| > 2 has two children ti,ti £ Tjj such that t is a 
disjoint union t = t\ U t 2 - 

A dimension tree determines recursive partitioning of the modes {1,..., TV}, e.g., see 
Figures 2(a) and (b). We denote the set of all leaf nodes by L C T/v and the set of 
children of a non-leaf node t by S b) C T/v. 

Let (i?t)teTiv be positive integers with R troot = 1. An element x £ V is an HT 
tensor with T/v-rank bounded by (i? t ) tg t n if 

(a) there exists a finite dimensional subspace = spanjurt"* : 1 < r t < R t } C 
yb) f or each leaf node t £ L, and 

(b) there exists a coefficient tensor £ ^ R t 1 xR t2 xR t f or eaeh non-leaf node 
t £ Tn\L, such that 

(c) intermediate vectors u;.]; 1 £ (£) net (1 < r t < Rt) for each non-leaf node 
t £ Tn\L are defined recursively by 


(2.5) 


Rt- 


u« = 


i?t 2 

E 

r*t 1 =lr t2 =l 






Vr tl ,rt 2 ,rt “ r tl 


• U 


(* 2 ) 


{h,t 2 } = s®, 


and x is represented by x = u^ root \ 

For example, if the tree T/v is the binary tree as illustrated in Figure 2(a), then a 
corresponding HT tensor x £ V can be written as 


x = u 


({ 1 , 2 , 3 , 4 }) 
1 


V „({1,2>3,4}) „({1,2}) 

/ . i'1'12 ,r 3 4,ri234 Uri,r 2 ,ri2^r 3 ,r4,r 34 




® U 


( 2 ) 
r 2 


<S> U 


(3) 

r 3 


<8> u 


(4) 

r 4 ‘ 


This rather abstract representation in (2.5) can be immediately applied to the cases 
when y<» = or V (n) = R InX Jn . 

2.3. The TT Format. The TT format, introduced in scientific computing by 
Oseledets and his coworkers [41, 42, 43, 44, 45], can be considered as a special case of 
the HT format when the dimension tree has a linear structure, such as the example 
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{1,2,3,4} 



{1} {2} {3} {4} 


(a) 


{1,2,3,4} 
{1W2^4> 
{2} {3,4} 


{3} {4} 

(b) 


Fig. 2. Illustration of two typical examples of dimension trees for HT tensors of order N = 4. 
(a) A balanced tree structure which can generate an HT format and (b) an unbalanced tree structure 
which can generate a TT format. 


in Figure 2(b). The TT representations for large scale vectors, matrices, and higher- 
order tensors can also be derived from the recursive definition of the HT format in 
(2.5). 

If we suppose that V^ = K /ra , then a large scale vector x G R Ji '" In = V can be 
represented in TT format by sums of Kronecker products 


( 2 . 6 ) 


Ri R 2 Rn- 1 

*=££-- £ 4£®x<a, 

n = l r 2 =l rjv-i=l 




1 > 


where G R /n are mode-2 fibers of 3rd-order tensors X^ G x/„x_R„_ 

The 3rd-order tensors X^ are called TT-cores, R\, R 2 , ■. ., Rn-i are called TT- 
ranks, and we define Rq = Rn = 1. We assume for convenience that the first and the 
last TT-cores (matrices) are also 3rd-order tensors of size 1 x Ii x Ri and -Rjv- 1 x I/v x 1, 
respectively. We call the TT representations in (2.6) (for large-scale vectors) as the 
vector TT format, which is equivalent to the MPS with open boundary conditions 
(OBC) (see Figure 3(a)(top).) 

A higher-order tensor X G R 7lX "' xJjv is said to be represented in TT format if 
its vectorization vec(A) G R Ii "' In is represented in TT format (2.6), i.e., x = vec(ff). 
In this case, the tensor X can be expressed by the mode-1 contracted products of 
3rd-order TT-cores (see Figure 3(a)(top)) 

(2.7) A = A (1) #T (2) *---*A (a0 , 

where X^ G R 7? ™-i x7 «x.r„ are the 3rd-order TT-cores. 

Note that TT decompositions (2.7) of a tensor X may not be unique, and the 
TT-ranks i?i,..., Rn-i depend on a specific decomposition. From (2.7), we have 
ranfc(X{„}) < R n , n = 1,..., N — 1. However, if a TT decomposition of a tensor X 

is minimal , i.e., all TT-cores have full left and right rank as rank(X.^) = R n and 

rank(X. |”j) = R n - 1 , then the TT-ranks of minimal TT decompositions are unique 
and satisfy rank(X.{ n y) = R n , n = 1,..., N— 1. The TT-ranks of a tensor X is defined 
as the TT-ranks of a minimal TT decomposition. See [25, 27] for more details. 

The storage cost can be significantly reduced if large-scale vectors and matrices 
can be approximately represented in TT formats with relatively small TT-ranks. For 
example, the storage complexity for a vector x G R il ''' Jjv represented in TT format is 
0(NQR 2 ) with Q = max„(/ n ) and R = max n (I?„) [41]. 

On the other hand, if we suppose that W ra ' = then a large scale matrix 

A G rA-'Lvx= y can k e represented in TT format as sums of Kronecker 
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*v(i) r r(i) 

•yv «yV/ •yV/ 



/id) 4 ( 2 ) 4 (AO 

VX i/X i/T 



(b) 


Fig. 3. Tensor network diagrams for TT decomposition of large-scale vectors and matrices, (a) 
Large-scale vector in vector TT format, which is equivalent to either the MPS with open boundary 
conditions (OBC) (top) or the MPS with periodic boundary conditions (PBC) (bottom), and (b) 
large-scale matrix in matrix TT format, which is equivalent to either the MPO with OBC (top), or 
MPO with PBC (bottom). 


products 


( 2 . 8 ) 


A = 


Rt 

E 


E 


TdA 

^N — 1 

E 


A 


(i) 

1 


>(2) 


) A 


(JV) 


.A . . i 7 

JV_ ivn 1 


where A (, 4 . £ are the slice matrices of 4th-order core tensors A*-"-* £ 

r n-l’ : ’ : ’ r n 

-1 xIn x J„ x. Rt ; and Rf , ,..., R%_ 1 are TT-ranks with R$ = = 1. The TT 

format for large-scale matrices in (2.8) is equivalent to the matrix product operators 
(MPO) in quantum physics [511, and we refer to it as the matrix TT format (see 
Figure 3(b)(top)). 

2.4. Extraction of Core Tensor for Matrix TT Format. For practical 
and theoretical purposes, we will introduce alternative representations for large-scale 
matrices based on the matrix TT format or MPO, which will be used for describing 
the proposed algorithm. 

The vectorization, which is an operation on matrices, can be extended to ma¬ 
trices represented in matrix TT format as follows. Let I = I\ I2 ■ ■ ■ In and J = 
J\ J‘2 ■ ■ ■ Jn ■ For a matrix A £ R /Xj in matrix TT format (2.8) with TT-cores 
A xi«xJ„xfi^ the extended vectorization of A can be defined by the 
vector in TT format, with slight abuse of our notation, 


(2.9) vec(A) 


R? 


r, —1 


tdA 


V(, C! A ( 1 rA ) 


) vec(A 


r A _i 
' N -1 —1 


(JV) 

A 

N-: 


,:,l) S 


p IJ 


i.e., each of the 4th-order TT-cores A^ is reshaped into the 3rd-order TT-core of 
size R n — 1 X IjiJn X Rn • 

Let 


p = vec(P) £ R IJ 


(2.10) 








N. LEE AND A. CICHOCKI 


9 


<n 


rp(n,n+\) 


9 


>n +1 


rp£) 


y(r>) 

<p("+i) 

j)(;i+2) >J>(A0 

m '£ 

^/7-1 

■ 


Vi ■^h+? V-lBB 

■■ 

■ ■ 

■ I 

II 

I 11 w 


Vi 


^n-n- 


^lf^n 


^n+l^n+X 


In+lJn+2 




Fig. 4. The TT format for the Nth-order tensor V in (2.12). The TT-cores are grouped into 
three sets and their contractions are denoted by V <n , 'p ( n > n + 1 ) f and p> 71 + 1 . 


denote the extended vectorization (2.9) of a large-scale matrix P E IR /xJ in matrix 
TT format whose 4th-order TT-cores can be reshaped into 3rd-order cores 

(2.11) p(") eK^-'X^x^, K n = I n J n , Vn, 

for the corresponding vector p. From the expression (2.7), the TVth-order tensor 
V £ R-^i x '" xK n determined by p = vec(V) can be written as the mode-1 contracted 
products (see Figure 4) 

•p = -pi 1 ) a p( 2 ) * ... * p( N ) g A'i x k 2 x ■ ■ ■ x Kn ' 

For n = 1,2,..., TV, the mode-1 contracted products of left TT-cores and right TT- 
cores are respectively denoted by 

p <n _ p(l) # p(2) # . . . , p(n- 1) g gKiX-xJr„_iXfl„_i 

P> n _ p(n+1) # p(n+2) # # p(N) g jii„xJf n+ iX-xJf« 

and we define 7 ?<1 = P >iv = 1. The TVth-order tensor V £ R Al x---xk n can ^g 
rewritten by 

(2.12) V = V< n • P ( ”’ n+1) ,p> n +\ n = 1, 2,..., iV — 1, 

where 

p(n,n+ 1) _ •p(ra) # p(ra+l) g gfl„-iXif n xJfn+iXii„+i 

is the mode-1 contracted product of the two neighboring TT-cores. The so-called 
frame matrix [28, 35] for our model is defined by 

(2.13) = (P>” +1 ) T <g>V +1 ®I Kn ® (P^) T € M^xfl n _ 1 A n if„ +1 fl n+1) 

where P^" £ M Rn - lXKl '" Kn ~ 1 and P^" +1 £ R A " +1 xk„ +2 --k n are moc j e _ n anc j 

mode-1 matricizations of the tensors V <n and p >ra + 1 ; respectively. 

From the expressions (2.12) and (2.13), we can derive an expression for the vector 
p = vec(P) (2.10) as a product of the frame matrix and a local vector p„: 

(2.14) p = P^p„ e R /J , n= 1,2,..., iV — 1, 


where 


p n = vec(P (n ’ n+1) ) e RRn-iK n K n+1 R n+1 
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is the vectorization of the merged TT-core 'p( n ’ n + 1 \ 

In order to simplify computation and improve efficiency of our algorithm, we need 
to orthogonalize core tensors. For this purpose, left- or right-orthogonalization of the 
3rd-order TT-cores V^ G ^R n -ixK n xR„ j g defined as follows [27]. 

Definition 2.3 (Left- or right-orthogonalization [27]). A 3th-order tensor U G 
jj/xJxif called, left-orthogonalized if its mode-3 matricization U( 3 ) G ’R KxIJ has 
orthonormal rows as 


U(3)U? 3) =I 

and right-orthogonalized if its mode-1 matricization Um G M. IxJK has orthonormal 
rows as 


U(i)Uf 1) =I / . 

_ r n \ 

We note that the left- or right-orthogonalization of 4th-order TT-cores V G 
jiiji-ixinxJnX Rn can be (i e fi ne( i by the left- or right-orthogonalization of the reshaped 
3rd-order TT-cores V^ G where K n = I n J n . The left- or right- 

orthogonalized 4th-order tensors 'P' n ' 1 G R fl ".-i xi n x .A» x -Rn are denoted by the tensor 
network diagram shown in Figure 1(c). We can show that the mode-n matricization 
P5 1 ] and the mode-1 matricization P))? +1 have orthonormal rows if the TT-cores 
pW,..., p( n_1 ) are left-orthogonalized and p("+ 2 ),..., p( N ) a re right-orthogonalized 
[38]. Consequently, the frame matrix P^ in (2.13) will have orthonormal columns if 
the TT-cores are properly left- or right-orthogonalized. 

3. Computation of Approximate Pseudoinverse Using TT Decompo¬ 
sition. Without loss of generality, we assume that I > J for a large-scale matrix 
A G R 7xJ with I = IiI 2 ---In and J = J 1 J 2 ■ ■ ■ Jn- We formulate the following 
optimization problem: for A > 0, 

min lllj- — P t A||J + A||P||2 
(3.1) p " " F 

s.t. pgT< s c 1 /xJ , 

where denotes the set of TT tensors of TT-ranks bounded by rank R = (i?i,..., 

Rn-i)- We denote the cost function by Fa(P) = ||lj — P 1 A||p + A||P|||. We assume 
that the matrix A G R 7xJ is given in matrix TT format (2.8). 

3.1. Modified Alternating Least Squares (MALS) Algorithm. In the 

MALS scheme, for each n G (1,2,..., N— 1} at each iteration, only the n and (n+l)th 
TT-cores are optimized while the other TT-cores are fixed. The large-scale optimiza¬ 
tion problem (3.1) can then be reduced to a set of much smaller scale optimization 
problems as we explain below. 

Note that the cost function in (3.1) can be expressed, in matrix form, as 

F x { P) = J + trace (p t AA t P - 2P T A + AP t p) . 

Let p = vec(P) G R 7 ' 7 denote the extended vectorization (2.9) of the matrix P G 
R 7xJ . From the matrix TT representation (2.8), we can derive that 

vec(A r P) = (Ij®A T )p = (Ij®A) T p, 


(3.2) 
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where Ij®A £ R /l7Xl72 denote the (extended) Kronecker product defined by Kro- 
necker products between core tensors as 




Ij®A = £ ■■■ £ (I 


J i 


L (1) 

A 


) 


(i 


J N 


r N-1 


^iJ- 


Using the expressions (3.2), we can express F\( P) in the vectorized form as 

F\(p) = J + p T (Ij<g>AA r )p - 2p T a + Ap 1 p, 

where a = vec(A) £ R /J . From the expression (2.14) and the orthogonality of the 
columns of the frame matrix P^, F\( P) can be further simplified as 

(3.3) F\ (pn) = J + p^A„p„ - 2p^b„ + Ap^p„, 

where A n and b„ are the relatively small-scale matrices and vectors defined by 

(3.4) A„ = (P^) T (Ij®AA T )P^ £ R«n-lifnA n+ ifl n+ iXfl„_ 1 A n A n+1 H„ +1 

and 


(3.5) b ra = (P^) T vec(A) £ ^Rn-iK n K n+1 R n+1 , n = 


Finally, we can obtain a set of reduced linked optimization problems: 

min p^ A„p„ - 2p^b n + Ap^p„ 

(3.6) p " 

S.t. p„ = Vec(P (n ’ n+1) ) £ Rfln-l^nJf» + liln + l ) 

for n = 1,2,..., N — 1. It should be noted that the size of the matrix A„ can be 
much smaller than AA 1 under the condition that R n -1 and R n +1 are relatively low 
and bounded. 

Figure 5 illustrates a tensor network diagram representing the cost function 
trace(P T AA i P — 2P 1 A + AP T P) both in matrix and vectorized forms. The trace 
is indicated by the blue line connecting the start block with the end block (e.g., see 
Figure 1(d)). In order to solve the optimization problem (3.1), each matrix is rep¬ 
resented approximately in matrix TT format. The large-scale optimization problem 
can then be reduced to a smaller-scale optimization problem, where the reduced cost 
functions are described as p^A„p„ — 2pjb ra + Ap^Pn (n = 1, 2,..., N — 1). 

The relatively small-scale matrix A n and vector b ra in the reduced local problem 
can be computed efficiently by recursive contractions of the core tensors in the tensor 
network diagram in Figure 5. The recursive computation procedure for the tensor 
network contractions can be described as follows. Let Z^ 
and £ R^-^m-iX^^m be matrices defined by 


z ( m) = £ PJL m>: ®A ^ <g> a:® p^ ijm . 


v T»( m ) 


= E P'S., 

i-m ijm 


, ("») 


m = 1 ,..., N, 
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+ 
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4 
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■ 

1 

J„ 

l 

■ 4+i ■ 



Rn+l 

^4+i 


| •A 1 ”’ W >( " ) | 

+ 


» 11 

* ti 4 

l R„u J 




J»(n+1) 


r; 



Fig. 5. Conceptual tensor network diagrams for the trace(P T AA T P — 2P T A + AP T P) for the 
optimization of then and ( n-\-T)th TT-cores, i.e., 'p( n > n + 1 ) = 'p( n )#'p( n + 1 ) 5 m the MALS algorithm. 
The large-scale optimization problem is reduced to set of equivalent and much smaller optimization 
problems, which are expressed by minimization of cost functions: p^A n pn — 2p^b n + Apjp n with 
p n = vec('p( n,ri + 1 )) E WL Rn - lKnK n+i R n+i (1 < n < N — 1). The left- and right-orthogonalized 
TT-cores of P are indicated by the half-filled squares. 
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where P (m) G R^-ix^xJ m xR m and A (m) g R <_ lX / m xJ m x^ are the 4th _ order 
TT-cores for the matrices P,A £ R /Xj in matrix TT format (2.8). Note that the 
trace terms in the cost function F\(P) can be expressed by 

trace(P T AA t P) = zf } • • • z[ N) , trace(P T A) = Z^ ■ • • Z^°. 

Two 4th-order tensors and lZ> m and two matrices and RiT m with sizes 

£< rn g ^Rm-lXRm-lXRm-lXRrn-l -^>m g JjflmXii^XK^Xii,,, 

L^ m g x , R> m G R Rm x , 

are defined recursively by, for p = 1,2, 

(3.7) vec(£< 1 )=l, vec(£< m ) T = vec(£< m_1 ) T Z( ) m " 1) , m = 2,3, ...,n, 

(3.8) 

vec(lZp N ) = 1, vec (Up 171 ) = Z^, m+1 Vec (R> rn+1 ) , m = N — 1 ,N — 2 ,...,n + 1. 

The tensor defined in (3.7) can be efficiently computed by contractions of the 
tensors {£^ m_1 , A (m_1) , A (m_1) , pi™" 1 )}. The tensor TZ> m and matrices 

L^- m and Rj’" 1 can be computed similarly. 

Finally, the matrix A n can be computed by contractions of the tensors |£f", 
A( n \ A ^, A^ n+1 \ lZ^ n+1 }, as illustrated in Figure 5. The vector b n can be 

computed by contractions of the tensors {L^ n , A^ n \ A^ n+1 \ R^ n+1 }. In practice, 
however, the local matrix A„ is not computed explicitly, but the matrix-by-vector 
multiplication A„x for some x G M. Rn - lKnKn+lRn + 1 is used by standard iterative 
methods. See Section 3.2.8 for more detail. 

After the solution, the resulting merged tensor p( n > n+1 ) = p( ra ) • p( n + 1 ) is de¬ 
composed into two separate TT-cores via the <5-truncated SVD [41]: for the matrix 

p(n^n+l) e |fl,_ 1 if„xif„ + ifi, +1 

(3.9) [U 1 ,S 1 ,V 1 ] = 5m(pg] n+1) ), 

with the subsequent updates R n = mm(rank(XJi), R n ), P| 3 ) T = Ui, and P|^ +1 -* = 
SiV] 1 ". In this way, the TT-ranks can be adaptively determined during the iteration 
process, and the TT-cores can be left- or right-orthogonalized. 

The proposed MALS algorithm 3 for the computation of an approximate Moore- 
Penrose pseudoinverse of large-scale matrices is described in Algorithm 1. 

3.2. Properties and Practical Considerations. 

3.2.1. Existence and Uniqueness of Solution. The proposed algorithm can 
be applied to general singular or non-singular structured matrices A G M /x J provided 
in TT format. Moreover, it can be in quite straightforward way extended to the 
minimization of a more general objective function 4 

(3.10) E A (P;B)= ||B T -P T A||p + A||P|||, 

3 We have also developed an alternating least squares (ALS) algorithm (without merging two 
TT-cores), but its performance was slightly lower than the MALS algorithm presented here. 
4 Representing regularized LS solution for AX = B, where A = A T . 
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Algorithm 1: MALS algorithm for computing approximate pseudoinverse 

input : A e R 7 i/2 '" /nX j n matrix TT format, e > 0 (tolerance 
parameter), S > 0 (truncation parameter), 
output: P G Rii U-'-Tv x Ji j 2 ■ ■ ■ Jn j n ma ^ r i x XT format with TT-cores V'" ) 
(1 < n < N) and TT-ranks ■ • ■ ,Rn-i- 


1 Initialize P randomly with right-orthogonalized TT-cores V^\ ..., 'P^ N '>. 

2 Set Cp 1 = 1 ,p = 1,2. Compute Rp n , n = 2,3,..., N,p = 1,2 by (3.8). 

3 repeat 

for n = 1, 2,..., N — 2 do 
// Optimization 

Optimize p( ra > n + 1 ) by solving the local optimization problem (3.6). 

// Matrix Factorization by SVD 


6 

7 

8 
9 

10 

11 

12 

13 

14 


Compute ^-truncated SVD: [Ui,Si, Vi] = SVD s (p5”j n+1) ). 
Update R n = min(ranfc(Ui), R n )- 
Update P = r eshape (Ui, i?„_i x I n x J n x R n ). 

Update = reshape(SiVj ,x 7 n +i x J„+i x i?„ + i). 

Compute Cp U+1 ,p = 1,2 by (3.7). 

end 

for n = N — 1,1V — 2,...,2 do 

Perform optimization and matrix factorization similarly 

end 


is until a stopping criterion is met (See (3.18),); 


where B G R JxL is a given matrix in TT format and P € R /Xi . For the above cost 
function, we can construct a similar tensor network as shown in Figure 5. A minimizer 
of the objective function F\(P;B), without constraints, is known as a least squares 
(LS) solution. In the case that A > 0, the solution is unique and it can be expressed 
by 

(3.11) P^ = (AA T + AI/) 1 AB, A > 0. 

On the other hand, if A = 0, then the solution may not be unique and it can be 
expressed by 

(3.12) P5 = (A t ) T B + Z, 

where Z £ R 7xL is any matrix satisfying Z 1 A = 0. If Z = 0, then we call it a 
minimum-norm LS solution, and it is unique. If B = I j, then the unique solution 
is equal to the (transposed) Moore-Penrose pseudoinverse. Furthermore, it has been 
shown that (see [2]) 

P^(A t ) 1 B as A -A 0. 

The solution to the local optimization problem (3.6), which is also a standard LS 
problem, can be written in the same way by 

Pri = (An + AI_R„_ 1 _R-„_R-„ +1 _R„ +1 ) t h n . 


(3.13) 
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Note that the local optimal solution (3.13) is the minimum-norm LS solution for the 
local optimization problem. Note that, from the expression (2.14), taking into account 
that the frame matrix has orthonormal columns, 



3.2.2. Stability and Stopping Criterion. From (3.3), we note that the value 
of the objective function F x ( P) for the global optimization problem (3.1) is exactly the 
same as the value of the objective function for the local optimization problem (3.6) 
neglecting irrelevant constant J. Therefore, we can conclude that the value of the 
original objective function will monotonically decrease during the iteration process. 

Proposition 3.1 (Monotonicity). Let Py t £ R lxj denote the estimated solution 
at the iteration t = 0,1, 2,..., and P^ t+1 £ K /Xj the estimated solution obtained by 
replacing the n and (n + l)th TT-cores of Pa,* with the solution p* = vec{V^ n ’ n+l ' , *) 
to the reduced optimization problem (3.6). Then, 

(3-14) Fa(Pa.i) > F X {P* W ). 


Moreover, we can easily compute the value of the global objective function from 
the value of the local objective function at each iteration. We define 


(3.15) 


rA(P) 


(Ja(p)) 1/2 

Jl/2 ’ 


and call it as the relative residual. Note that the minimum value of F X (P) can be 
expressed in terms of the singular values of the matrix A as follows: 


(3.16) 


77'min 

^A 


mm 

PeR /xJ 


F X (P) = F X (P* X ) = J- 


rank(A.) 

E 

r= 1 


<jf + \ 


where ay are the nonzero singular values of A. So, the minimum value of the relative 
residual is bounded as 


(3.17) 


1 rank( A) < 


min 

PeR JxJ 


Up) = 


^min 

j 


< i, 


where the lower bound can be attained when A = 0, and the upper bound when 
A = oo. 

Given a tolerance parameter e > 0, the stopping criterion of the MALS algorithm 
can be executed when a rate of decrease of the relative residual is smaller than e as 


(3-18) r 2 x (P X}t -N+ 2 ) ~ r x (P x , t ) < e 2 ■ r\{P X}t _ N+2 ). 

However, due to the machine precision, the computed r x value can be not sufficiently 
precise if its value decreases to a very small value relative to the norm ||Ij||f = J 1 ^ 2 ■ 
In this case, it should be computed directly using the matrices A and P represented 
in TT format, rather than indirectly using A„, b„, pi 71 ), and p( n+1 ). 

3.2.3. Selection of Truncation Parameter. The truncation parameter 5 in 
the (5-truncated SVD step affects the accuracy and the convergence speed. If (5 is too 
small, estimated TT-ranks may increase fast and the computational cost can become 
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high. If 5 is too large, on the other hand, the algorithm may not be able to achieve 
desired accuracy. Hence, a S value determines the trade-off between computational 
cost and accuracy. We note that Oseledets [41] considered <5o = (TV — l) -1 / 2 e in the 
context of low-rank approximation, and Lee and Cichocki [39] set 6 = 100<5o for the 
first TV —1 iterations and then set 5 = Sq for the rest of the iterations. In our numerical 
simulations in this paper, we used a fixed value S = 10 _6 (TV — l)" 1 / 2 unless mentioned 
otherwise. 

3.2.4. Conditioning of Local Optimization Problems. Note that the con¬ 
tracted matrix A„ in the expression (3.4) is symmetric and positive definite, and the 
frame matrix has orthonormal columns if the TT-cores (m = 1,..., n— 1) are 
left-orthogonalized and (to = n + 2, ..., TV) are right-orthogonalized. Assuming 
the orthonormality of P^, we can show that (see [28]) 

Amin(AA ) A A m i n (A n ) A A max (A n ) A A max (AA ), 


where A m i n (M) and A max (M) are the minimum and maximum of the eigenvalues of a 
real symmetric matrix M, respectively. Therefore, keeping the TT-cores either left- 
and right-orthogonalized is important for running the MALS algorithm efficiently, 
especially when we use any standard iterative method for solving local optimization 
problems. 

3.2.5. Avoiding Breakdowns. Some of the popular and efficient algorithms 
for computing preconditioners suffer from unexpected failures (which is often referred 
to as breakdowns) during the preconditioner construction step, e.g., in incomplete 
factorization methods [5]. On the other hand, the proposed MALS algorithm is free 
of such risks, because we can freely choose any efficient and reliable method for local 
optimizations. In the experiments, we used the Matlab function gmres as a standard 
iterative method for the computation of the solution to the reduced optimization 
problems (3.6). 

3.2.6. An Effect of Regularization to Convergence. The regularization 
term not only alleviates the ill-posedness of the optimization problem (3.1) (or more 
generally (3.10)), but also improves the convergence property of the proposed algo¬ 
rithm. 

Let be the global solution defined by (3.11), P \ t the estimate at the current 
iteration (t = 0,1,2,...), and Ra.* = AB — (AA 1 + AI/)Pa,( the residual. Since 
AB = (AA t + AI/ )P^ for A > 0, we can derive that (see also [3]) 


(3.19) 


IPa-Pa,* 


If< 


(AA t + AI 7 ) 


-1 


UR'A.tllp < «§(A) 


IIf 


|R 


A 


IIAB 


where k 2 (A) = ||AA T + AI/|| 2 ||(AA T + AI/)” 1 ^ is the spectral condition number of 
the matrix AA T + AI/. It is clear that the larger the A value is, the smaller the values 
of k 2 (A) and ||P^||| become. So, the regularization with A > 0 reduces multiplicative 
factors on the right-hand side of (3.19), which improves the convergence speed of the 
current estimate Pa,* to the global solution P^. 

3.2.7. Preconditioning of Large-Scale Systems of Linear Equations. The 

estimated pseudoinverse can be helpful for preconditioning of system of linear equa¬ 
tions (see, (1.2).) By using the estimated preconditioner, we can convert overde¬ 
termined or underdetermined systems of linear equations into well-posed determined 
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systems. In addition, any nonsymmetric data matrix A can be converted to a square 
symmetric matrix approximately, as stated in the following proposition. 

Proposition 3.2 (Symmetricity). Let A £ K /Xj denote a given matrix with 
I > J, £ WL IxJ the minimizer of F x ( P) defined in (3.11) and (3.12) with B = I j, 
pmm m inimum value defined in (3.16), and 

(3.20) G A (P) =F X (P)-Fr n - 

Then, it holds that, for any A > 0 and P £ R 7Xl7 , 

(3.21) ||P t A - A T P||p < 2G A (P) - 2A ||P - P* X \\ 2 F . 

For the proof of Proposition 3.2, we formulate the following lemma, which can be 
derived immediately after some algebraic manipulation. 

Lemma 3.3. For A > 0, 

(3.22) G A (P) = F a (P) - F x ( P* x ) = ||P t A - Pf A||* + A ||P - P* X \\ 2 F . 


Proof. Proof of Proposition 3.2. From Lemma 3.3, it follows that 

||P t A - A T P||p = ||P t A - P^ t A + P^ t A - A T P||p < 2 ||P t A - P^ T A||p 
= 2G A (P) — 2A ||P — P A ||p . 


□ 

The distribution of the eigenvalues and the singular values of the preconditioned 
matrix P 1 A affects the convergence of an iterative method. The following theorem 
states that the eigenvalues and the singular values of P 1 A obtained by the proposed 
method can be made close to the eigen/singular values of the matrix P a t A by de¬ 
creasing the G a (P) value. 

Theorem 3.4. Let A,P^ £ 1RL IxJ (/ > J) and G A (P) be defined as in Proposi¬ 
tion 3.2. Then, for any A > 0 and P £ M. lxj , 

(3.23) £ |A r ([P T A] s ) - A r (Pf A)| 2 < G A (P) - A ||P - P* x f F , 

r =1 

(3.24) £ \a r (P T A) - a r (P? A)| 2 < G A (P) - A ||P - P* x f F , 

r —1 

where [N]s = (N + N T )/2, A r (M) are the eigenvalues of a real symmetric matrix M, 
oy(N) are the singular values of a matrix N, and both A r (M) and ay (N) are arranged 
in decreasing order. 

Proof. Since for any M £ R jxj , it holds that ||M|||— ||[M]s||| = ||M—M T |||/4 > 
0, we can derive, by the substitution M = P 1 A — P^ 1 A, that 

||P t A - Pf A||p > || [P t A] s - Pf A||p . 

We can expand the right hand side of the above inequality as 

(3.25) ||[P t A]s - P A T A||p = ||[P T A]s||p + ||P A T A||p - 2 • trace ([P t A]s ■ A T P!) . 
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Note that, for a matrix M G R' /Xj with a Shur decomposition M = QTQ 1 , we have 
l|M||p = ||T||p > ||diag(T)||| = £ |A r (M)| 2 . 

r= 1 

As it is stated in [37, Lemma II. 1], the following relations hold: for any JxJ Hermitian 
matrices M and N, 

j J 

^A r .(M)A J _ r+ i(N) < trace(MN) < ^ A r (M)A r (N). 

r— 1 r— 1 

By using the above inequalities, we can derive that 

(3.26) 

|| [P T A] s - Pf A\\l > J2 (|Ar([P T A] s )| 2 + |A r (Pf A)| 2 - 2Af[P T A] s )A r (Pf A)) 

r—1 

= ^|A r ([P T A] s )-A r (Pf A)| 2 . 

r—1 

The result in (3.23) follows using Lemma 3.3. 

To prove the result in (3.24), we use the expression 

(3.27) ||P t A - Pf A|| 2 = ||P T A||p + ||Pf A|| 2 - 2 • trace (P T A • A T Pf . 

For a matrix M G R jxj , we have ||M||| = ff =1 f r (M)| 2 . By applying the von 
Neumann’s trace inequality [40], which states that, for JxJ matrices M and N, 

J 

|trace(MN)| < £V r (M)<r r (N), 

r= 1 


we can finally derive that 

||P t A - Pf A|| 2 > ^ f r (P T A) - a r (Pf A)| 2 . 

r= 1 

Similarly, we can obtain the result in (3.24) by using Lemma 3.3. □ 

From (3.23) of Theorem 3.4, we can say that the eigenvalues of the matrix [P T A]g 
can become close to those of the symmetric positive semidefinite matrix Pf A when 
G\( A) is small enough. If A is of full column rank, then the smallest eigenvalue 
of [P T A]s can approach to Aj(Pf A) = cr 2 /(cr 2 + A) >0, where (Jj = of A), so 
the preconditioned matrix P T A can also become positive definite. In addition, from 
(3.24) of Theorem 3.4, the spectral condition number of P 1 A also can be made close 
to that of Pf A by decreasing G>(P) gradually. 

3.2.8. Computational Complexity. The most time-consuming step in the 
MALS algorithm is the optimization step for solving the reduced optimization prob¬ 
lems (3.6). Let Q = max„(/ n , J ra ), R = max„(i? ra ), and Ra = max„(iif). For a fast 
computation of the solution, we can apply standard iterative methods such as the 
gmres in Matlab for solving the system of linear equations (A n + AIfl ll _ 1 if n if n+1 /j n+1 )x 
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Fig. 6. A procedure for the sequential computation of the matrix-by-vector multiplica¬ 
tion A. n x. with a vector x E 'R R n-i I n J n I n+i J n+i R n+i 3 which can be carried out by a sequen¬ 
tial contraction of the tensors {C^ n , X, A^ n \ A^ n \ A^ n ^~ 1 \ 7Z> n + 1 }, where X E 

^R n -ixi n x Jn,xin+i* J-n+i*Rn+i i s a 6th-order tensor. The mode sizes are denoted for simplic¬ 
ity by Q = max n (/ n , J„), R = max n (i? n ), and Ra = max n (i?jJ). 


= b n . In this case, the matrix A n does not need to be computed explicitly, instead 
the matrix-by-vector multiplication A n x can be computed faster by the gradual (re- 
cursive) contraction of the tensors {Cf n , X, A (n) , A (n) , A (ll+1) , A (n+1) , 7£f n +^}, as 
illustrated in Figure 6. The computational complexity for the multiplication A„x is 
0(R 3 R\Q 4 + R 2 R\Q 5 ). Since the computational cost for each iteration is indepen¬ 
dent of the order N if R, Ra, and Q are bounded, the total computational cost for 
optimizing every TT-cores is logarithmic in the matrix size Q N x Q N . 

On the other hand, the explicit computation of the matrix A„ can be performed 
by the iterative contraction of the tensors {£f n , A (n) , A (n) , A (n+1) , A (li+1) , ^ 1 >n+1 }, 
and its computational complexity is 0(R 4 R\Q 4 + R 2 R 3 A Q 5 ). Moreover, a direct 
method, such as the LU factorization or the pseudoinverse, for solving the system 
A„x = b„ can cost up to 0(R 6 Q 6 ). 

The estimated preconditioner is reusable and does not need to be updated during 
the process of solving the preconditioned system of linear equations. However, the 
preconditioner P in the TT format needs to take small TT-ranks because the product 
P 1 A increases the TT-ranks [41]. Note that in the case of sparse approximate inverse 
preconditioning techniques [5], similarly, the product P T A would deteriorate sparsity 
of A even if both P and A are sparse. In such cases, the preconditioning can be 
performed implicitly. 

4. Numerical Simulations. In the simulation study, we considered several ma¬ 
trices including rectangular matrices and nonsymmetric matrices in order to check va¬ 
lidity and evaluate performance of the proposed MALS algorithm. We also compared 
the proposed method with an alternative method which applies a “standard” MALS 
method to solve the large scale system of linear equations (see also [43]) 

(4.1) (IjigiAA 1 + AI/<g)Ij)vec(P) = vec(A). 

In practice, the matrix IjigiAA 1 is represented in matrix TT format with 4th-order 
TT-cores of the sizes (R^-i) 2 x I n J n x I n J n x ( R ^) 2 , n = 1,2, ...,7V. Due to 
the relatively large sizes of the TT-cores, the computational cost for the matrix-by¬ 
vector multiplication in the standard MALS algorithm has complexity of 0(R 3 R\Q 4 + 
R 2 R\Q 6 ) [43], _ 

The TT-ranks of the estimated pseudoinverse were bounded by R = (50,..., 50). 
We repeated the simulations 30 times with random initializations and averaged the 
results. In the simulation results, we calculated the value of the relative residual r\ 
(3.15) as described in Section 3.2.2. Our code was implemented in Matlab. We used 
the Matlab version of TT-Toolbox [42] for building and manipulating TT formats for 
large-scale matrices and vectors. Simulations were performed on a desktop computer 
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Fig. 7. (a) The (unordered) prescribed eigen/singular values aj of the 2 ;V x 2 :V circulant matrix 
C and (b) the entries Ck of C, with N = 5 and various values of B = 0.3, 0.5, 0.7. The rectangular 
matrix A € R 2 + x2 has the same singular values as C. 


with an Intel Core i7 4960X CPU at 3.60 GHz with 32 GB of memory running 
Windows 7 Professional and Matlab R2010b. 


4.1. Example 1: Rectangular Circulant Matrices with Prescribed Sin¬ 


gular Values. Rectangular matrices A £ 


x2 


were defined by A = 


where C = [ci-j]ij £ R 2 x2 are circulant matrices, i.e. , Ck = Ck- 2 » , k = 0, 1,..., 2 N — 
1, whose entries were determined in the following way. The (unordered) eigen/singular 
values of C were first prescribed by, with J = 2 N , (see, Figure 7(a)) 


Tj = fti) 


1 

— max 
B 



j -0.5 


+ B- 0.5 


B > 0, j = 0,..., J - 1. 


Then, the entries Ck of the circulant matrix C were determined by the discrete Fourier 
inverse transform (IFFT) of the eigen/singular values [13], i.e., (see, Figure 7(b)) 



The very large scale vector \ck\k represented in TT format can be efficiently com¬ 
puted by the QTT-FFT algorithm [16], which is available in TT-Toolbox [42], By 
construction, the matrix A has the same singular values as C. We note that A is 
ill-conditioned when B < 0.5. 

Figure 8 illustrates that the relative residual decreases monotonically as the iter¬ 
ation proceeds. From Figure 8(a), we can see that the proposed algorithm converges 
relatively fast within one or two full sweeps (1 full sweep is equal to 2(TV — 2) itera¬ 
tions) in the case that A > 0, and the TT-ranks of the estimated pseudoinverse also 
remain at low values. On the other hand, without regularization (i.e., A = 0), we can 
see that the convergence is slow, and the TT-ranks also grow very quickly during the 
iteration process. 

For comparison of the computational costs, we set the tolerance parameter at 
e = 0.2. Figure 9 illustrates the computational costs and the estimated TT-ranks 
by the proposed MALS algorithm (MALS-PINV) and the standard MALS algorithm 
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N=50, B=0.5 



(a) Relative residual 


(b) Maximum of TT-ranks 


Fig. 8. Convergence of the proposed MALS algorithm for various values of the regularization 
parameter A for the 2 N+1 x 2 N rectangular circulant matrices with prescribed singular values and 
N = 50, B = 0.5. (a) Relative residual versus iteration, and (b) maximum of the TT-ranks of the 
estimated pseudoinverse versus iteration. The markers on the lines indicate half-sweeps, i.e., every 
N — 2 iterations. 


(Std MALS), for various values of 20 < N < 100 and B G {0.3,0.5,0.7}. Some values 
of the computational time are not displayed in the figure if it was larger than 360 
seconds. We can see that the computational costs increased only logarithmically with 
the matrix size 2 W+1 x 2^. The computation time of the proposed MALS algorithm 
was much smaller than that of the standard MALS algorithm (typically, by one or 
even two orders) as analyzed theoretically in Section 3.2.8 and in the first paragraph 
of Section 4. Moreover, the computation time for the regularized optimization with 
A > 0 was shorter than that for the optimization without regularization, i.e., A = 0. 
In addition, for the case of ill-conditioned matrices with either B = 0.3 or B = 
0.5, no regularization with A = 0 resulted in large estimated TT-ranks and high 
computational costs, whereas the regularization with A > 0 significantly reduced 
estimated TT-ranks and also computational costs. 

4.2. Example 2: Randomly Generated Matrices with Prescribed Singu- 

c\N c\N 

lar Values. Nonsymmetric matrices A G R 2 x2 ' were constructed in A-dimensional 
matrix TT format using 

A = USV T , 

where S = diag(cro, a i, • • •, o^-i) is a diagonal matrix of singular values, and U and 
V are 2 W x 2^ orthogonal matrices with TT-ranks equal to 1, i.e., 

U = U (1) <g> • • • <8> U (Ar >, V = V (1) <g> • • • <8> V (Ar) , 

and U<"> G R 2x2 and G R 2x2 are randomly generated orthogonal matrices. The 
singular values were determined by 

^=10“^, i = 0,- 1, 

for fixed 0 < Kq < 1. The largest singular value is equal to 1, and the singular values 
decay to zero as the index j increases in a rate determined by Kq. The TT-ranks of 
the matrix A and the inverse A -1 are the same as those of E and S -1 , respectively, 
which are = R„ =1 (1 < n < N — 1). 























PSEUDOINVERSE COMPUTATION USING TENSOR TRAIN 


21 






(a) B = 0.3 (b) B = 0.5 (c) B = 0.7 


Fig. 9. Computational cost versus N (top row) and maximum of the estimated TT-ranks 
versus N (bottom row), for the 2 N+1 x 2 N rectangular circulant matrices with various values of 
20 < N < 100, B G {0.3, 0.5, 0.7}, and A E {0,0.1}. Some values of the computational time are 
not displayed in the figure if it was larger than 360 seconds. Std MALS means the standard MALS 
method applied for solving the linear system (4.1), and MALS-PINV means the proposed MALS 
method. 


Figure 10 illustrates the convergence of the proposed MALS algorithm for various 
values of the regularization parameter A for the matrices with N = 50 (2 50 x 2 50 ~ 
10 15 x 10 15 ) and Kq = 0.5. The convergence was monotonic, i.e., the relative residual 
was nonincreasing during the iteration process. The larger A values resulted in larger 
relative residual values, as described in Section 3.2.2. Since the matrix A is not ill- 
conditioned when K 0 = 0.5, the convergence was relatively fast, and the estimated 
TT-ranks were small. In Figure 10(b), when A = 0 the estimated TT-ranks were 
R n = 1, which are equal to the true TT-ranks of the inverse A -1 . 

Figure 11(a) shows that the obtained minimal relative residual values were not 
different for various values of A, but they were different over various values of K$. 
In addition, Figure 11(b) shows that the relative residual values were different for 
various values of the regularization parameter A. The simulation results illustrated 
in Figures 11(a) and (b) are consistent and in agreement with the analysis in Sec¬ 
tion 3.2.2. 

4.3. Example 3: Laplace Operator. The 1-D discrete Laplace operator of 
size 2 N x 2 N with Dirichlet-Dirichlet boundary condition is considered [30]: 

A = tridiag(—1, 2, -1) G 

The matrix A is square and symmetric, and its explicit TT representation and TT- 
ranks were already investigated and presented in [30]. The TT-ranks of the inverse 
A -1 are also known to be (i?i, I? 2 ,..., Rn-i) = (4, 5, 5,..., 5,4) [30]. 

Figure 12 illustrates the convergence of the MALS algorithm for various values 
of A G {0,10 -6 ,10 -4 ,10~ 2 } and a fixed N = 60. The relative residual decreased 
monotonically, but the convergence was slow when the A value was small in the range 
A G {0,10~ 6 }. In Figure 12(b), for the small A values, the maximum of the estimated 
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(a) Relative residual 


N=50, K q =0.5 



(b) Maximum of TT-ranks 


Fig. 10. Convergence of the MALS algorithm for various values of the regularization parameter 
X for the 2 N x 2 N randomly generated matrices with prescribed singular values and N = 50 and 
Kq = 0.5. (a) Relative residual versus iteration, and (b) maximum of the TT-ranks of the estimated 
pseudoinverse versus iteration. The markers on the lines indicate half-sweeps, i.e., every N — 2 
iterations. 




(a) A = 0, fV G {10,..., 50} (b) A = {0,..., 0.01}, TV = 50 

Fig. 11. (a) Relative residual (r\) versus Kq for various values of N, and (b) relative residual 
(r\) versus Kq for various values of the regularization parameter X, for the 2 N x 2 N randomly 
generated matrices with prescribed singular values. 


TT-ranks increased largely during the iteration process, whereas for the larger A 
values (A = 10 _4 ,10~ 2 ), the TT-ranks were relatively small. Note that the values 
A = 10 -4 ,10~ 2 are still relatively small compared to the diagonal entries of AA 1 
in (3.11), which are 5 or 6. We can conclude that the regularization with a A > 0 
value is necessary to obtain low-rank approximate pseudoinverses of ill-conditioned 
large-scale matrices. 

The computational costs for the estimation of the pseudoinverses are illustrated in 
Figure 13 together with the estimated maximum TT-ranks. It is illustrated that the 
computational time increased logarithmically with the matrix size 2 N x 2 W , and the 
estimated TT-ranks were bounded over all TV values. Moreover, the proposed MALS 
algorithm needs shorter computational time than the standard MALS algorithm. 

4.4. Example 4: Convection-Diffusion Equation. In order to demonstrate 
the effectiveness of the proposed algorithm in preconditioning nonsymmetric systems 
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N=60 



N=60 



(a) Relative residual 


(b) Maximum of TT-ranks 


Fig. 12. Convergence of the proposed MALS algorithm illustrated by (a) relative residual (r\) 
and (b) maximum of the TT-ranks of the estimated pseudoinverses, for various values of the regu¬ 
larization parameter A for the 2 60 x 2 60 discrete Laplace operator (N = 60J. The markers on the 
lines indicate half-sweeps, i.e., every N — 2 iterations. 



N N 


(a) Computation time 


(b) Maximum of TT-ranks 


Fig. 13. (a) Computation time and (b) maximum of the TT-ranks of the estimated pseudoin¬ 
verse, for the 2 n x 2 n discrete Laplace operators for various values of the regularization parameter 
A G {0,10 —2 }. Std MALS means the standard MALS method applied for solving the linear system 
(4.1), and MALS-PINV means the proposed MALS method. 


of linear equations, we consider the 3-D convection-diffusion equation on the unit cube 
[0, l] 3 described in [49]: 

(4.2) Uxx 4” 'U'yy 4” ^zz 4“ CU X = 
where the function / is defined by the solution 

(4.3) u(x, y, z) = exp (xyz) sin(7rx) sin(7ry) sin(7rz). 

By finite difference discretization, each axis is discretized by 2 M + 2 grid points in¬ 
cluding the boundary points. The number of equations is 2 M • 2 M • 2 M = 2 3M = 2 N 
with N = 3 M. We set c = 2 JV_1 °. It is noted in [49] that the matrix A is a strongly 
nonsymmetric matrix, and it has eigenvalues with large imaginary parts, which slow 
the convergence of conjugate gradient-type algorithms such as Bi-CGSTAB. 

First, the regularized inverses of the coefficient matrix were estimated by the 
proposed algorithm. Figure 14 illustrates the convergence of the proposed algorithm 
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Fig. 14. Convergence of the proposed MALS algorithm for various values of the truncation pa¬ 
rameter 5 = 2 • 10~ 3 ,2 • 10 —5 ,2 • 10 —7 for the discretized convection-diffusion equation with N = 30 
and A = 10 —4 . (a) Relative residual and (b) maximum of the TT-rank of the estimated pseudoin¬ 
verse. The markers on the lines indicate half-sweeps, i.e., every N — 2 iterations. 




(a) Computation time (b) Maximum of TT-ranks 



(c) Relative residual 


Fig. 15. (a) Computation time, (b) maximum of the estimated TT-ranks, and (c) relative resid¬ 
uals obtained by the proposed MALS algorithm for the 2 N x 2 N coefficient matrix of the convection- 
diffusion equation for various values of the regularization parameter A = 10 6 ,10 —4 ,10 —2 . 


for three truncation parameter values: S = 2 ■ 10~ 3 ,2 • 10~ 5 , and 2 • 10~'. We can 
see that smaller 6 values result in a faster convergence per each iteration, but the 
TT-ranks also increase faster, which may cause high computational costs. On the 
other hand, a too large value of <5 may result in large relative residuals. Since the 
TT-ranks of the estimated pseudoinverse influence the computational cost in the next 
step of solving a preconditioned system of linear equations, it is important to balance 
the approximation accuracy and TT-ranks of the estimated pseudoinverse. 

Figure 15 illustrates the computation time for the estimation of the regularized 
pseudoinverses, for various values of N and regularization parameter A, when the 
truncation parameter was set at <5 = 10 _4 (fV — l) -1 / 2 . Large A value, e.g., A = 0.01, 
resulted in relatively small TT-ranks and short computation time for N = 15, 90. 

Next, we computed solutions to the convection-diffusion equation (4.2) numer¬ 
ically by using the function dmrg_solve2 [43] in TT-Toolbox [42], where the linear 
equation was either preconditioned or not by the estimated regularized pseudoinverse. 
For solving local optimization problems in the dmrg_solve2, we used one of the three 
different Matlab functions, gmres, bicgstab, and peg, in order to compare them 
for solving nonsymmetric systems of linear equations. However, we found almost no 
differences in performances between them in this simulation, so, we only presented 
the results of the bicgstab. The dmrg_solve2 algorithm converged to the relative 
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(a) A = 10° (b) A = 10” 6 


Fig. 16. Comparison of the performances of the numerical solution algorithm (dmrg_solve2 
[f.2, 431) f or solving of the linear systems without preconditioning (DMRG(A)) and preconditioned 
systems (DMRG(P T A)) described in Section 4-4■ F° r local optimizations for dmrg_solve2, the 
Matlab function bicgstab was applied for the cases that (a) X = 10° and (b) A = 10 —6 . 


residual tolerance of 10~ 4 within 20 full-sweeps mostly (one full-sweep is equivalent 
to solving the local problems 2(iV — 2) times) for 15 < N < 90. The computational 
costs for solving the linear systems are illustrated in Figures 16(a) and (b), where the 
preconditioned systems were solved much faster than without preconditioning. The 
estimated TT-ranks of the solution x ranged between 10 and 15 and were almost con¬ 
stant as N increased (although not presented here), implying that the computational 
costs were affected mostly by the convergence of the local algorithm. In addition, it 
should be noted that the computational costs for solving the square linear systems as 
illustrated in Figure 16 were lower than the costs for the preconditioner computation 
in Figure 15(a). 

5. Conclusion and Discussion. We presented a new MALS algorithm for the 
computation of approximate pseudoinverses of extremely large-scale structured ma¬ 
trices using low-rank TT decomposition. The proposed method can estimate the 
Moore-Penrose pseudoinverses of any nonsymmetric or nonsquare structured matrices 
in low-rank matrix TT format approximately, so it can be useful for preconditioning 
overdetermined or underdetermined large-scale systems of linear equations. 

The proposed method provides stability and the fast convergence speed even for 
very ill-conditioned large-scale matrices by regularization. The regularized solutions 
were shown to have relatively small TT-ranks in the numerical simulations, so the 
computational costs for the construction of preconditioners and solution to huge sys¬ 
tems of linear equations were significantly smaller than without regularization. The 
regularization technique is especially important when the size of a data matrix is huge 
and ill-conditioned. 

The proposed algorithm converts the large-scale minimization problem into se¬ 
quential smaller-scale optimization problems to which any standard optimization 
methods can be applied. The convergence to the desired solution is stable and rela¬ 
tively fast because the TT-ranks of the approximate inverses can be adaptively deter¬ 
mined during the iteration process, and the decrease in the objective function value 
is monotonic. The computational cost for running the proposed MALS algorithm is 
logarithmic in the matrix size under the assumption of boundedness of TT-ranks. 

The estimated pseudoinverses were applied to preconditioning of the strongly 
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nonsymmetric matrix occurring in systems of linear equations in the convection- 
diffusion equation problem. Several standard iterative algorithms such as GMRES, 
Bi-CGSTAB, and PCG showed the improved convergence in the numerical simula¬ 
tions, which demonstrate the effectiveness of the proposed algorithm by precondition¬ 
ing and symmetrizing the coefficient matrix. 

The main advantage of the proposed method lies in its applicability to any rect¬ 
angular huge structured matrices. Moreover, the regularization technique employed 
in the proposed method helps to compute approximate pseudoinverses reliably for any 
ill-conditioned structured matrices which admit low-rank TT approximations. 

The developed algorithm can further be directly applied to the following ar¬ 
eas. First, the computation of regularized Moore-Penrose pseudoinverses is closely 
related to the regularized (filtered) solution of systems of linear equations Ax = b, 
by x = P^ T b, where P^ T « [7, 8]. Second, the large scale generalized eigenvalue 

decomposition (GEVD) problem described by Ax = ABx for a square matrix A 
and a nonsingular square matrix B [21] can be transformed to a standard eigenvalue 
decomposition problem B x Ax = Ax if the large-scale inverse matrix B 1 can be 
approximately computed in TT format efficiently. Once the large-scale matrices A 
and B 1 are represented in TT format, the multiplication B _1 A can be relatively 
easily performed [41]. Third, a special case of the optimization problem (3.10) arises 
in important subspace clustering problems [54], which can also be efficiently solved 
using the proposed algorithm based on TT decompositions. 

REFERENCES 

[1] R. Andreev and C. Tobler, Multilevel preconditioning and low-rank tensor iteration for 

space-time simultaneous discretizations of parabolic PDEs, Numer. Linear Algebra Appl., 
22 (2015), pp. 317-337. 

[2] J. C. A. Barata and M. S. Hussein, The Moore-Penrose pseudoinverse: A tutorial review of 

the theory , Braz. J. Phys., 42 (2012), pp. 146-165. 

[3] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. M. Donato, J. Dongarra, V. Eijkhout, 

R. Pozo, C. Romine, and H. Van der Vorst, Templates for the Solution of Linear 
Systems: Building Blocks for Iterative Methods, 2nd Edition, SIAM, Philadelphia, 1994. 

[4] D. Braess and W. HACKBUSCH, Approximation ofl/x by exponential sums in [l,oo), IMA J. 

Numer. Anal., 25 (2005), pp. 685-697. 

[5] M. Benzi and M. Tuma, A comparative study of sparse approximate inverse preconditioners, 

Appl. Numer. Math., 30 (1999), pp. 305-340. 

[6] G. Beylkin and M. J. Mohlenkamp, Numerical operator calculus in higher dimensions , Proc. 

Natl. Acad. Sci. USA, 99 (2002), pp. 10246-10251. 

[7] J. Chung and M. Chung, An efficient approach for computing optimal low-rank regularized 

inverse matrices , Inverse Problems, 30 (2014), 114009. 

[8] J. Chung, M. Chung, and D. P. O’Leary, Optimal regularized low rank inverse approxima¬ 

tion , Linear Algebra Appl., 468 (2015), pp. 260-269. 

[9] A. ClCHOCKl, Era of big data processing: A new approach via tensor networks and tensor 

decompositions, arXiv: 1403.2048, 2014. 

[101 A. ClCHOCKl, Tensor networks for biq data analytics and larqe-scale optimization problems , 
arXiv: 1407.3124, 2014. 

[11] A. ClCHOCKl, R. Zdunek, A. H. Phan, and S. Amari, Nonnegative Matrix and Tensor Fac¬ 

torizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Sep¬ 
aration, Wiley, Chichester, 2009. 

[12] X. Cui AND K. Hayami, Generalized approximate inverse preconditioners for least squares 

problems, Japan J. Indust. Appl. Math., 26 (2009), pp. 1-14. 

[13] P. J. Davis, Circulant matrices, Wiley, New York, 1979. 

[14] L. De Lathauwer, B. De Moor, and J. Vandewalle, A multilinear singular value decom¬ 

position, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1253-1278. 

[15] S. V. Dolgov, TT-GMRES: Solution to a linear system in the structured tensor format, 

Russian J. Numer. Anal. Math. Model., 28 (2013), pp. 149-172. 


PSEUDOINVERSE COMPUTATION USING TENSOR TRAIN 


27 


[16] S. Dolgov, B. Khoromskij, D. Savostyanov, Superfast Fourier transform using QTT ap¬ 

proximation, J. Fourier Anal. Appl., 18 (2012), pp. 915-953. 

[17] S. V. Dolgov and D. V. Savostyanov, Alternating minimal energy methods for linear systems 

in higher dimensions , SIAM J. Sci. Comput., 36 (2014), pp. A2248-A2271. 

[18] M. Espig, W. Hackbusch, S. Handschuh, and R. Schneider, Optimization problems in 

contracted tensor networks , Comput. Vis. Sci., 14 (2011), pp. 271-285. 

[19] A. Falco and W. Hackbusch, On minimal subspaces in tensor representations , Found. Com¬ 

put. Math., 12 (2012), pp. 765-803. 

[20] L. Giraldi, A. Nouy, and G. Legrain, Low-rank approximate inverse for preconditioning 

tens or-structured linear systems, SIAM J. Sci. Comput., 36 (2014), pp. A1850-A1870. 

[21] G. H. Golub and C. F. Van Loan, Matrix Computations, Third Edition, Johns Hopkins 

University Press, Baltimore, 1996. 

[22] L. Grasedyck, Hierarchical singular value decomposition of tensors , SIAM J. Matrix Anal. 

Appl., 31 (2010), pp. 2029-2054. 

[23] L. Grasedyck, D. Kressner, and C. Tobler, A literature survey of low-rank tensor approx¬ 

imation techniques , GAMM-Mitt., 36 (2013), pp. 53-78. 

[24] M. J. Grote and T. Huckle, Parallel preconditioning with sparse approximate inverses , SIAM 

J. Sci. Comput., 18 (1997), pp. 838-853. 

[25] W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer, Berlin, 2012. 

[26] W. Hackbusch and S. Kuhn, A new scheme for the tensor representation. J. Fourier Anal. 

Appl., 15 (2009), pp. 706-722. 

[27] S. Holtz, T. Rohwedder, and R. Schneider, On manifolds of tensors of fixed TT-rank, 

Numer. Math., 120 (2012), pp. 701-731. 

[28] S. Holtz, T. Rohwedder, and R. Schneider, The alternating linear scheme for tensor opti¬ 

mization in the tensor train format, SIAM J. Sci. Comput., 34 (2012), pp. A683-A713. 

[29] V. Kazeev, M. Khammash, M. Nip, and C. Schwab, Direct solution of the chemical mas¬ 

ter equation using quantized tensor trains, PLoS Comput. Biol., 10 (2014), el003359. 
doi: 10.1371 /journal.pcbi. 1003359 

[30] V. A. Kazeev and B. N. Khoromskij, Low-rank explicit QTT representation of the Laplace 

operator and its inverse, SIAM J. Matrix Anal. Appl., 33 (2012), pp. 742-758. 

[31] B. N. Khoromskij, Tens or-structured preconditioners and approximate inverse of elliptic op¬ 

erators in R d , Constr. Approx., 30 (2009), pp. 599-620. 

[32] B. N. Khoromskij, 0(dlogN)-quantics approximation of N-d tensors in high-dimensional 

numerical modeling, Constr. Approx., 34 (2011), pp. 257-280. 

[33] B. N. Khoromskij, Tensors-structured numerical methods in scientific computing: Survey on 

recent advances, Chemometr. Intell. Lab. Syst., 110 (2012), pp. 1—19. 

[34] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 

(2009), pp. 455-500. 

[35] D. Kressner, M. Steinlechner, and A. Uschmajew, Low-rank tensor methods with sub¬ 

space correction for symmetric eigenvalue problems, SIAM J. Sci. Comput., 36 (2014), 
pp. A2346-A2368. 

[36] D. Kressner and C. Tobler, Preconditioned low-rank methods for high-dimensional elliptic 

PDE eigenvalue problems, Comput. Methods Appl. Math., 11 (2011), pp. 363-381. 

[37] J. B. Lasserre, A trace inequality for matrix product, IEEE Trans. Autom. Control, 40 (1995), 

pp. 1500-1501. 

[38] N. Lee and A. Cichocki, Fundamental tensor operations for large-scale data analysis in tensor 

train formats, arXiv: 1405.7786, 2014. 

[39] N. Lee and A. Cichocki, Estimating a few extreme singular values and vectors for large-scale 

matrices in tensor train format, SIAM J. Matrix Anal. Appl., 36 (2015), pp. 994-1014. 

[40] L. Mirsky, A trace inequality of John von Neumann, Monatsh. Math., 79 (1975), pp. 303-306. 

[41] I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput., 33 (2011), pp. 2295-2317. 

[42] I. V. Oseledets, MATLAB TT-Toolbox, Version 2.3, June 2014, 

https: //github.com/oseledets/TT-Toolbox. 

[43] I. V. Oseledets and S. V. Dolgov, Solution of linear systems and matrix inversion in the 

TT-format, SIAM J. Sci. Comput., 34 (2012), pp. A2718-A2739. 

[44] I. V. Oseledets and E. E. Tyrtyshnikov, Breaking the curse of dimensionality, or how to 

use SVD in many dimensions, SIAM J. Sci. Comput., 31 (2009), pp. 3744-3759. 

[45] I. V. Oseledets, E. E. Tyrtyshnikov, and N. L. Zamarashkin, Tensor-train ranks of ma¬ 

trices and their inverses, Comput. Meth. Appl. Math., 11 (2011), pp. 394-403. 

[46] T. Rohwedder and A. Uschmajew, On local convergence of alternating schemes for opti¬ 

mization of convex problems in the tensor train format, SIAM J. Numer. Anal., 51 (2013), 
pp. 1134-1162. 


28 


N. LEE AND A. CICHOCKI 


[47] D. V. Savostyanov, S. V. Dolgov, J. M. Werner, and I. Kuprov, Exact NMR simulation 

of protein-size spin systems using tensor train formalism, Phys. Rev. B, 90 (2014), 085139. 

[48] M. Signoretto, Q. T. Dinh, L. De Lathauwer, and J. A. K. Suykens, Learning with 

tensors: A framework based on convex optimization and spectral regularization , Machine 
Learning, 94 (2014), pp. 303-351. 

[49] P. SONNEVELD AND M. B. VAN GlJZEN, IDR(s): A family of simple and fast algorithms for 

solving large nonsymmetric linear systems, SIAM J. Sci. Comput., 31 (2008), pp. 1035- 
1062. 

[50] L. Sorber, I. Domanov, M. Van Barel, and L. De Lathauwer, Exact line and plane search 

for tensor optimization, Comput. Optim. Appl. (2015), pp. 1-22. doi:10.1007/sl0589-015- 
9761-5 

[51] U. Schollwock, The density-matrix renormalization group in the age of matrix product states, 

Ann. Phys., 326 (2011), pp. 96-192. 

[52] N. Vervliet, O. Debals, L. Sorber, and L. De Lathauwer, Breaking the curse of dimen¬ 

sionality using decompositions of incomplete tensors: Tensor-based scientific computing 
in big data analysis, IEEE Signal Process. Mag., 31 (2014), pp. 71-79. 

[531 S. R. White, Density-matrix algorithms for quantum renormalization qroups, Phys. Rev. B., 
48 (1993), pp.10345-10356. 

[54] Y.-L. Yu AND D. Schuurmans, Rank/norm regularization with closed-form solutions: Appli¬ 
cation to subspace clustering, in Proceedings of the Twenty-Seventh Conference on Un¬ 
certainty in Artificial Intelligence, F. Cozman and A. Pfeffer, eds., AUAI Press, Corvallis, 
Oregon, 2011, pp. 778-785. 


N=50, B=0.3 


1 

0.9 

0.8 

0.7 

0.6 


0.5 


0.4 


0 


* 







7 






\ - V 

-v 

- V- 

-V- 

-v 


- 1 _ 

in i 

II 1 1 

II 1 1 

II 1 1 

II 1 1 

«>f 

-I__ 

= = = = & 

II 1 1 

II 1 1 

II 1 1 

II 1 1 

a$>+ 

iii i 

-+ 

= = = =& 

v 1=0.1 
+ 1=0.01 

A 1=0.001 

x 1=0.0001 

O 1=0 

1 






_ 

_ 



50 


100 150 200 250 300 

Iteration 






















10 

10 

10 

10 

10 


N=50, B=0.7 


i 

\r -V 

-V- 

-1_. 

-V 

-+- 

-v- 

-+- 

- -v 

-+ 


-,A 

- -X- 

-- A- 

- - -X- 

-A- 

--x - 

-A- 

-x - 

--A 

- X 


i 

' - - i 

i 

i 

i 

i 






i 

■-'O 

-o- 

-0- 

-Q - 

- o 


50 100 150 200 250 300 

Iteration 




















N=60 






Iteration 


0 









































N=60 



Iteration 


0 





































TT-Ranks 


N=60 


QC 


X=0.01 
HX=0.0001 
A-X=1e-006 
e-i=o 



20 30 40 

n=0,1,...,N 
















































































































IT- Ranks 


N=60 



X = 0.01 
-^= 0.0001 
^=1e-006 



20 40 

n=0, 


60 















































10 

10 

10 

10 


-x- DMRG(A) + GMRES 
-o DMRG(A) + BICGSTAB 
-O DMRG(A) + PCG 


DMRG(P t A) + GMRES 
DMRG(P t A) + BICGSTAB 
DMRG(P T A) + PCG 



30 45 60 75 90 

Dimension, N 



























































